Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

local: fixes a data race in anti-entropy sync #12324

Merged
merged 7 commits into from
Feb 14, 2022
Merged

local: fixes a data race in anti-entropy sync #12324

merged 7 commits into from
Feb 14, 2022

Conversation

rboyer
Copy link
Member

@rboyer rboyer commented Feb 11, 2022

The race detector noticed this initially in TestAgentConfigWatcherSidecarProxy but it is not restricted to just tests.

The two main changes here were:

  • ensure that before we mutate the internal agent/local representation of a Service (for tags or VIPs) we clone those fields
  • ensure that there's no function argument joint ownership between the caller of a function and the local state when calling AddService, AddCheck, and related using copystructure for now.
Failed
=== RUN   TestAgentConfigWatcherSidecarProxy
[INFO] freeport: detected ephemeral port range of [32768, 60999]
[INFO] freeport: reducing max blocks from 30 to 15 to avoid the ephemeral port range
==================
WARNING: DATA RACE
Write at 0x00c000d7e958 by goroutine 130:
  github.com/hashicorp/consul/agent/local.(*State).updateSyncState()
      /home/circleci/project/agent/local/state.go:1109 +0x109e
  github.com/hashicorp/consul/agent/local.(*State).SyncFull()
      /home/circleci/project/agent/local/state.go:1191 +0x30
  github.com/hashicorp/consul/agent/ae.(*StateSyncer).nextFSMState()
      /home/circleci/project/agent/ae/ae.go:178 +0xa2
  github.com/hashicorp/consul/agent/ae.(*StateSyncer).nextFSMState-fm()
      /home/circleci/project/agent/ae/ae.go:171 +0x4d
  github.com/hashicorp/consul/agent/ae.(*StateSyncer).runFSM()
      /home/circleci/project/agent/ae/ae.go:164 +0x89
  github.com/hashicorp/consul/agent/ae.(*StateSyncer).Run()
      /home/circleci/project/agent/ae/ae.go:158 +0x48
  github.com/hashicorp/consul/agent.(*Agent).StartSync·dwrap·28()
      /home/circleci/project/agent/agent.go:1616 +0x39

Previous read at 0x00c000d7e958 by goroutine 167:
  github.com/hashicorp/consul/agent.buildAgentService()
      /home/circleci/project/agent/agent_endpoint.go:260 +0x1a4
  github.com/hashicorp/consul/agent.(*HTTPHandlers).AgentService.func1()
      /home/circleci/project/agent/agent_endpoint.go:448 +0x29d
  github.com/hashicorp/consul/agent.(*Agent).LocalBlockingQuery()
      /home/circleci/project/agent/agent.go:3842 +0x2d0
  github.com/hashicorp/consul/agent.(*HTTPHandlers).AgentService()
      /home/circleci/project/agent/agent_endpoint.go:423 +0x411
  github.com/hashicorp/consul/agent.(*HTTPHandlers).handler.func3()
      /home/circleci/project/agent/http.go:292 +0x65
  github.com/hashicorp/consul/agent.(*HTTPHandlers).wrap.func1()
      /home/circleci/project/agent/http.go:548 +0x1029
  github.com/hashicorp/consul/agent.(*HTTPHandlers).handler.func1.1()
      /home/circleci/project/agent/http.go:226 +0xcb
  net/http.HandlerFunc.ServeHTTP()
      /usr/local/go/src/net/http/server.go:2047 +0x4d
  github.com/NYTimes/gziphandler.GzipHandlerWithOpts.func1.1()
      /home/circleci/go/pkg/mod/github.com/!n!y!times/gziphandler@v1.0.1/gzip.go:287 +0x433
  net/http.HandlerFunc.ServeHTTP()
      /usr/local/go/src/net/http/server.go:2047 +0x4d
  net/http.(*ServeMux).ServeHTTP()
      /usr/local/go/src/net/http/server.go:2425 +0xc5
  github.com/hashicorp/go-cleanhttp.PrintablePathCheckHandler.func1()
      /home/circleci/go/pkg/mod/github.com/hashicorp/go-cleanhttp@v0.5.1/handlers.go:42 +0xc1
  net/http.HandlerFunc.ServeHTTP()
      /usr/local/go/src/net/http/server.go:2047 +0x4d
  github.com/hashicorp/consul/agent.(*wrappedMux).ServeHTTP()
      /home/circleci/project/agent/http.go:155 +0x69
  net/http.serverHandler.ServeHTTP()
      /usr/local/go/src/net/http/server.go:2879 +0x89a
  net/http.(*conn).serve()
      /usr/local/go/src/net/http/server.go:1930 +0x12e4
  net/http.(*Server).Serve·dwrap·87()
      /usr/local/go/src/net/http/server.go:3034 +0x58

Goroutine 130 (running) created at:
  github.com/hashicorp/consul/agent.(*Agent).StartSync()
      /home/circleci/project/agent/agent.go:1616 +0xb8
  github.com/hashicorp/consul/agent.(*TestAgent).Start()
      /home/circleci/project/agent/testagent.go:230 +0xcaa
  github.com/hashicorp/consul/agent.StartTestAgent.func1()
      /home/circleci/project/agent/testagent.go:106 +0x53
  github.com/hashicorp/consul/sdk/testutil/retry.run.func2()
      /home/circleci/project/sdk/testutil/retry/retry.go:148 +0x58
  github.com/hashicorp/consul/sdk/testutil/retry.run()
      /home/circleci/project/sdk/testutil/retry/retry.go:149 +0xfe
  github.com/hashicorp/consul/sdk/testutil/retry.RunWith()
      /home/circleci/project/sdk/testutil/retry/retry.go:108 +0x72
  github.com/hashicorp/consul/agent.StartTestAgent()
      /home/circleci/project/agent/testagent.go:104 +0x13b
  github.com/hashicorp/consul/connect/proxy.TestAgentConfigWatcherSidecarProxy()
      /home/circleci/project/connect/proxy/config_test.go:90 +0x164
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1259 +0x22f
  testing.(*T).Run·dwrap·21()
      /usr/local/go/src/testing/testing.go:1306 +0x47

Goroutine 167 (running) created at:
  net/http.(*Server).Serve()
      /usr/local/go/src/net/http/server.go:3034 +0x847
  github.com/hashicorp/consul/agent.newAPIServerHTTP.func1()
      /home/circleci/project/agent/apiserver.go:104 +0x78
  golang.org/x/sync/errgroup.(*Group).Go.func1()
      /home/circleci/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 +0x96

@rboyer rboyer requested a review from a team February 11, 2022 20:46
@rboyer rboyer self-assigned this Feb 11, 2022
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging February 11, 2022 20:48 Inactive
@vercel vercel bot temporarily deployed to Preview – consul February 11, 2022 20:48 Inactive
@rboyer rboyer added theme/testing Testing, and related enhancements and removed theme/testing Testing, and related enhancements labels Feb 11, 2022
@rboyer
Copy link
Member Author

rboyer commented Feb 11, 2022

Related: #12028 & #12041

Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I think @kisunji has a lot more context here, so I'll wait on his review.

Does this depend on both previous fixes, or can we roll back some of that?

agent/local/state.go Outdated Show resolved Hide resolved
.changelog/12324.txt Outdated Show resolved Hide resolved
@vercel vercel bot temporarily deployed to Preview – consul February 11, 2022 22:37 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging February 11, 2022 22:37 Inactive
@vercel vercel bot temporarily deployed to Preview – consul February 11, 2022 22:38 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging February 11, 2022 22:38 Inactive
@vercel vercel bot temporarily deployed to Preview – consul February 11, 2022 22:50 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging February 11, 2022 22:50 Inactive
@rboyer
Copy link
Member Author

rboyer commented Feb 11, 2022

Does this depend on both previous fixes, or can we roll back some of that?

Parts of the prior fixes are still relevant so I think it makes more sense to roll forward.

@@ -1083,31 +1083,35 @@ func (l *State) updateSyncState() error {
continue
}

// Make a shallow copy since we may mutate it below and other readers
// may be reading it and we want to avoid a race.
nextService := *ls.Service
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still copies even if it's not going to mutate it for 1 of the 2 reasons below, but I sense that the clarity this provides in reading might be fine given the slightly increased allocations in the AE loop.

agent/local/state.go Outdated Show resolved Hide resolved
@rboyer rboyer requested a review from kisunji February 11, 2022 22:57
@kisunji kisunji linked an issue Feb 14, 2022 that may be closed by this pull request
@vercel vercel bot temporarily deployed to Preview – consul February 14, 2022 16:22 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging February 14, 2022 16:22 Inactive
@rboyer rboyer requested a review from kisunji February 14, 2022 16:24
@rboyer rboyer changed the title fix data race in TestAgentConfigWatcherSidecarProxy local: fixes a data race in anti-entropy sync Feb 14, 2022
@rboyer rboyer merged commit fa4577d into main Feb 14, 2022
@rboyer rboyer deleted the fix-data-race branch February 14, 2022 16:41
@hc-github-team-consul-core
Copy link
Contributor

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/582260.

@hc-github-team-consul-core
Copy link
Contributor

🍒❌ Cherry pick of commit fa4577d onto release/1.11.x failed! Build Log

rboyer added a commit that referenced this pull request Feb 14, 2022
rboyer added a commit that referenced this pull request Feb 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data race in local state sync
4 participants